In [1]:
%%HTML
<style>
.container{width:90% !important;}
</style>
In [2]:
import pickle
from IPython.core.display import HTML
from bokeh.plotting import output_notebook
output_notebook()
Loading BokehJS ...

In order to study the social interactions among users in the same company we have build a directed network representation of our dataset. We consider each employee a node in our network. Given two employees A and B, if the employee A has liked or disliked a comment made by B, an edge from A to B will be added to the network. The Adjacency matrix from the resulting graph will be used to perform a two components NMF clustering using two different values as weights: The total number of interactions and the number of likes divided by the total number of interactions. This clustering will be performed in both the directed and undirected versions of the adjacency matrices, giving rise to eight different graph-based features.

Graph representation of employee data

In order to study the social interactions among users in the same company we have build a directed network representation of our dataset. We consider each employee a node in our network. Given two employees A and B, if the employe A has liked or disliked a comment made by B, an edge from A to B will be added to the network.

Fig 1

In the following representation of the dataset the color of each node depends on the target variable churn. A node will be red if it corresponds to an employee who churned, and blue for employees that will not churn in the following twelve weeks.

Each edge is coloured depending on the relative agreement of an interaction. Ranging from red (rel agreement=0) to green (rel_agreement=1), each node has an alpha value proportional to the number of interactions, ranging from 0.5 to 1.

It is also possible to zoom in and hover over the nodes and edges to display a tooltip.

In [3]:
with open('bokeh/interactions_churn.pck','rb') as f:
    dataset = pickle.load(f)
HTML(dataset)
Out[3]:

Social interactions clustering

Clustering techniques will be used in this network representation to extract social interaction features. We will perform the Non Negative Matrix Factorization (NMF) with n_components=2 to extract two different features, each one representing how much a given node belongs to each of the two clusters.

when performing the NMF, we will use different adjacency matrices of the network, each one representing different aspects of the interactions performed by each employee.

Directed and undirected graphs

When performing NMF clustering on a given edge attribute we will use the weighted adjacency matrix of the graph representing that attribute. Although our network is represented using a directed graph, we will also perform NMF clustering on the corresponding undirected version of the network. This means that we will end up with four features for each edge attribute studied: two features representing clusters in the directed version of the network and two features representing the clusters of the undirected version of the network.

Weighting by number of interactions

We have extracted four features that relate to the number of interactions of each employee. We have assigned each edge a weight equal to the number of interactions (likes+dislikes) that ocurred between each pair of users.

We extracted two components from NMF clustering using the adjacency matrix of the directed weighted graph representing the number of interactions, and two additional features using the adjacency matrix of the undirected graph representing the interactions. This means that the undirected version only take into account the number of interactions. It does not take into account if an intaraction was from A to B of viceversa.

Fig 2.

In this graph, each edge is coloured according to its number of interactions, ranging from blue (low) to red (high).

The color of each node represents the normalized value of the first NMF component of the directed graph of interactions. It presents a diverging pattern that assigns orange to the lower values of the attribute (0) black to teh intermediate values (0.5) and green to the highest values (1).

In [4]:
with open('bokeh/NMF1_inters_d.pck','rb') as f:
    dataset = pickle.load(f)
HTML(dataset)
Out[4]:

Weighting by realative agreement

The relative agreement is defined as the number of likes that user A has given to user B divided by the total number of interactions from A to B. This represents a measure of how often an employee agrees with another one. Using this measure as weights of the adjacency matrix, we perform the same calculations as we did when we used the number of interactions as weight.

Fig 3.

In this figure the edges have the same colouring pattern as in Fig 1.

The color of the nodes represent the normalized value of the first NMF component of the relative agreement directed graph. Ranging from blue blue (0) to red (1).

In [5]:
with open('bokeh/rel_agre_NMF1_d.pck','rb') as f:
    dataset = pickle.load(f)
HTML(dataset)
Out[5]:

Fig 4

This figure displays the same information as Fig 3, but in this case we have discretized the NMF values in three different groups:

  • Blue: Normalized NMF ranging from 0 to 0.33.
  • Black: Normalized NMF rangin from 0.33 to 0.66.
  • Orange: Normalized NMF rangin from 0.66 to 1.
In [6]:
with open('bokeh/rel_agree_3.pck','rb') as f:
    dataset = pickle.load(f)
HTML(dataset)
Out[6]:

Network representation of the employee interactions. Each edge is coloured according to its percentage of likes ranging from red(low) to green(high). Each node is coloured by its NMF first component of the rel. agreement graph. Lower values are blue, medium are black and the highest values are orange.

In [ ]: